Back

Proteins: Structure, Function, and Bioinformatics

Wiley

Preprints posted in the last 30 days, ranked by how well they match Proteins: Structure, Function, and Bioinformatics's content profile, based on 82 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
AlphaFold3 predicted LWO G-protein complex from European robin features active-state biased Gα

Hungerland, J.; Kostritski, A.; Koch, K.-W.; Solov'yov, I.

2026-05-20 biophysics 10.64898/2026.05.19.726335 medRxiv
Top 0.1%
12.3%
Show abstract

Avian phototransduction and magnetoreception have been proposed to involve shared retinal proteins, including interactions between long-wavelength opsin (LWO), the cone-specific heterotrimeric G protein (Gt), and cryptochrome 4a (Cry4a), yet structural information on avian phototransduction complexes is lacking. Here we present and critically assess two atomistic models of the European robin LWO-Gt complex generated by distinct modelling strategies. A full-complex prediction using AlphaFold3 yields a tightly packed, structurally stable interface but exhibits pronounced activation-like conformational features of the Gt-subunit that persist in simulations of the isolated protein, revealing a strong bias toward the active state. In contrast, a template-guided assembly based on single-chain predictions and an experimental rhodopsin-Gt reference structure forms a weaker interface and shows no intrinsic activation bias, while still displaying subtle activation-related dynamics. These results demonstrate that machine-learned complex prediction can encode functional states independently of the local interaction environment, thereby limiting its interpretability for signalling mechanisms that hinge on activation equilibria. Our findings highlight the need for explicit assessment of conformational-state bias when modelling regulatory protein assemblies and provide a structural framework for future studies of Cry4a-dependent modulation of retinal G-protein signalling in avian magnetoreception.

2
The Rossmann2x2 Fold Attains its Native Structure Via a Defined Pathway of Sequential and Cooperative Folding Units

Bustamante, C. J.

2026-05-22 biophysics 10.64898/2026.05.21.726993 medRxiv
Top 0.1%
8.3%
Show abstract

Despite progress in predicting protein structures, how proteins arrive at their native state remains a subject of continuous debate. We present a single molecule force spectroscopy study of the unfolding and refolding intermediates of the conserved, diverse, and ancient Rossmann2x2 fold ({beta}12{beta}34{beta}56{beta}78). By inserting glycines at different locations in the protein, we can follow in real time and annotate its unfolding and refolding intermediates. This protein folds along a single reversible pathway involving the ordered and sequential organization of discrete and cooperative folding units or foldons: unfolded {rightleftarrows} {beta}12{beta}3 {rightleftarrows} {beta}12{beta}34{beta}5 {rightleftarrows} {beta}12{beta}34{beta}56{beta}7 {rightleftarrows} {beta}12{beta}34{beta}56{beta}78. This strict order results from the formation of an autonomously folding unit (primary foldon) and the subsequent organization of elements (secondary foldons) whose stability depends on their interactions with previously organized ones.

3
Simple baselines rival protein language models in mutation-dense design tasks

Talpir, I.; Fleishman, S. J.

2026-05-06 bioinformatics 10.64898/2026.05.01.722313 medRxiv
Top 0.1%
8.2%
Show abstract

Computational protein design demands generally applicable models that reliably predict or generate unmeasured variants with superior functional properties. Although protein language models (pLMs) have been used in zero-shot and transfer-learning design studies, they have generally not been assessed in benchmarks that explicitly test combinatorial extrapolation from lower- to higher-order variants. Here we benchmark widely used pLMs against conventional baseline methods in recently described dense, experimentally validated multi-mutant landscapes. We find that regardless of architecture and parameter count, pLMs are statistically similar to one another, and none consistently outperforms conventional baseline methods. Furthermore, their ability to distinguish functional from non-functional variants in zero-shot prediction is comparable to that of conventional homology-based methods. We suggest that to contribute significantly to the design of protein function, pLMs may need to encode biophysical and structural priors or be combined with structure-based approaches.

4
Structural bias in machine learning-guided peptide design

Aldas-Bulos, V. D.; Plisson, F.

2026-05-08 bioinformatics 10.64898/2026.05.06.721805 medRxiv
Top 0.1%
6.4%
Show abstract

Machine learning continues to accelerate peptide and protein design through the rapid prediction and generation of sequences with desired characteristics. Many applications focus on predicting properties, functions, and structures, as well as generating point mutations and de novo designs. Nevertheless, many models prove less generalizable than initially claimed. Most predictors and generators are trained on sequential datasets, where imbalances can be addressed during preprocessing. In contrast, structural bias, a subtype of algorithmic bias arising from uneven representation of structural classes in training datasets, and the limitations of early protein structure predictors have frequently remained undetected and uncorrected. The recent surge in powerful protein structure prediction tools, such as the AlphaFold and RosettaFold series and their variants, now presents opportunities to mitigate this issue. We hypothesize that such structural sampling biases influence the downstream performance of ML models. Using antimicrobial peptides as a case study, we audited the structural biases in 16 state-of-the-art predictors for antimicrobial activity and tested whether structural information constrains their predictions. Our analysis revealed that models explicitly trained on sequential data still produce predictions biased by uneven fold representations and data leakage. These findings highlight the importance of integrating balanced structural data or implementing bias-mitigating strategies to develop agnostic models that maximize bioactive protein discovery and multi-objective optimization.

5
Phylogenetic Analysis and Structural Evaluation of Staphylococcus aureus Serine-Aspartate Repeat-Containing Protein D with a Focus on Periprosthetic Joint Infection

Joachimiak, A.; Tan, K.; O'Connor, K. A.; Zhou, X.; Gade, P.; Garcia, E.; Tan, A.; Nijhawan, A.; Endres, M.; Kim, Y.; Greenwood-Quaintance, K.; Patel, R.

2026-05-05 biophysics 10.64898/2026.05.01.722179 medRxiv
Top 0.1%
4.1%
Show abstract

Serine-aspartate repeat-containing protein D (SdrD) is a Staphylococcus aureus cell wall-anchored, calcium-binding adhesin member of the MSCRAMM Sdr subfamily that may contribute to bacterial adhesion and virulence. S. aureus is the most common cause of periprosthetic joint infection (PJI). Population-level distribution and sequence diversity of SdrD among clinical PJI isolates have not been systematically characterized, and the SdrD binding mechanism is still not well understood. To address these gaps, sdrD alleles were queried across 156 newly sequenced PJI isolates and compared to publicly available S. aureus genomes, and nucleotide- and protein-level phylogenies of the sdrCDE locus constructed. The SdrD crystal structure from S. aureus JH1 was determined, with solution small-angle X-ray scattering (SAXS) and molecular dynamics (MD) simulations, and assessment of conformational changes with calcium depletion. Three dominant sdrD subtypes were defined, associating with USA300, JH1, and TCH60; the JH1 sdrD subtype was predominant among PJI isolates. Structural studies showed that the conformation of individual domains and interdomain organization of the multidomain SdrD have limited flexibility in solution, and that the calcium-binding B domain retains its core fold under conditions of calcium depletion. Together, the findings presented support functional diversification among Sdr family members in mediating host attachment and inform a re-evaluation of the ligand-binding mechanism previously proposed for SdrD. AUTHOR SUMMARYStaphylococcus aureus is the leading cause of infections that develop around joint implants (periprosthetic joint infection, PJI). This bacterium has a large arsenal of surface proteins that allow it to stick to human tissues and implanted devices. This work focused on one such protein, SdrD, which has been linked to implant-associated infections but the structure and diversity of which among patients with PJI had not been well characterized. The genetic sequences of SdrD were analyzed across thousands of bacterial genomes, including those from patients with PJI. Distinct genetic variants of the protein were found, one of which was particularly common with PJI. The three-dimensional structure of SdrD was determined at atomic resolution and solution small-angle X-ray scattering (SAXS) and molecular dynamics used to study how it moves and responds to changes in its environment. Contrary to what was previously described, SdrD was shown to be relatively rigid. These findings change how SdrDs mechanism of action should be considered, potentially informing design strategies to block bacterial attachment before infection takes hold.

6
Deciphering conformational preferences of RNA in protein-RNA recognition

Kant, S.; Masipeddi, S.; Bahadur, R. P.

2026-05-15 biophysics 10.64898/2026.05.14.725147 medRxiv
Top 0.1%
4.0%
Show abstract

Conformational plasticity of RNAs plays important roles in recognizing RNA-binding proteins, and is often modulated by their binding partners. Here, we investigate RNA conformational preferences in a non-redundant dataset of 263 protein-RNA complexes to characterize the structural landscape associated with protein recognition. RNA dinucleotide segments are analyzed using seven backbone torsion angles ({delta}1, {varepsilon}1, {zeta}1, 2, {beta}2, {gamma}2, and {delta}2), two glycosidic torsion angles ({chi}1 and {chi}2) and the pseudo-torsion angle . Focusing on dinucleotide steps present in both interface and non-interface regions, we performed density-based clustering using selected backbone torsion angles to identify recurrent conformational states. We identify 28 distinct RNA dinucleotide conformers containing at least ten members each. Among these, eight conformers represent previously unreported nucleotide conformers (NtCs), including the transitional and the non-canonical states AB06, AB07, BB21, BB22, OP32, OP33, IC08 and IC09. Several of these conformers are preferentially enriched at protein-binding interfaces, suggesting their involvement in local conformational adaptation during protein-RNA recognition. The newly identified conformers span transitional A-B geometries, distorted B-like states, open conformations and compact intercalated structures, highlighting the remarkable structural plasticity of RNA in ribonucleoprotein complexes. Overall, this study expands the current understanding of RNA conformational space and provides a refined RNA dinucleotide conformer library for protein-RNA complexes. These findings will facilitate the identification of novel RNA structural motifs and improved RNA structural modeling, docking protein-RNA complexes and deep learning-based prediction frameworks for describing RNA tertiary structures.

7
COSMIC-Linked Ras Mutations at the Interface Between H-Ras and PI3KγRBD Frequently Generate Affinity Increases as Well as Affinity Decreases

Mead, E. H.; Batz, K. C.; Shih, K.-H.; Fleming, I. R.; Tesdahl, C. D.; Lizardos, L.; Armendariz, J. R.; Hannan, J. P.; Hickey, A. M.; Leyk, A.; Erbse, A. H.; Falke, J. J.

2026-05-06 biochemistry 10.64898/2026.05.01.722339 medRxiv
Top 0.2%
3.8%
Show abstract

The three conventional isoforms of the Ras G-protein (H-, K-, N-Ras) function as molecular on-off switches that regulate a wide array of signaling pathways, including the Ras-PI3K-PIP3-PDK1-AKT pathway that is central to innate immunity and normal cell growth, and is dysregulated in many disease states. Activation of the pathway by Ras requires adequate Ras-PI3K binding affinity. Here we focus on the interface of known structure in the H-Ras:PI3K{gamma} co-complex essential to multiple pathways including directed pseudopod growth in leukocyte chemotaxis. At this interface 10 H-Ras residues, all 100% conserved between the H-, K- and N-Ras isomers, contact the Ras binding domain of PI3K{gamma} (PI3K{gamma}RBD). To investigate the degree to which the native H-Ras:PI3K{gamma}RBD interface is optimized by evolution for maximal binding affinity, 8 interfacial Ras mutations selected from the COSMIC database and the literature were introduced at the contact positions. All 8 Ras mutations were observed to alter the H-Ras:PI3K{gamma}RBD binding affinity, with 4 mutations yielding significant affinity increases and 4 yielding significant affinity decreases. These findings indicate that the native H-Ras:PI3K{gamma}RBD interface provides intermediate, rather than maximal, binding affinity. Such intermediate affinity is consistent with the substantial binding plasticity of the conserved H-, N-, K-Ras effector docking surface, which has evolved to bind a diverse array of effectors. Furthermore, the findings provide evidence that COSMIC-linked mutations at the H-Ras:PI3K{gamma}RBD interface frequently generate affinity increases as well as decreases, with potential implications for molecular mechanisms of disease and for tool development in cell biology.

8
Assembly-active and -inactive forms of HBV capsid protein provide distinctly different binding sites for capsid assembly modulators

Scott, L. W.; Perez-Segura, C.; Hadden-Perilla, J.; Zlotnick, A.

2026-05-14 biochemistry 10.64898/2026.05.13.724798 medRxiv
Top 0.2%
3.6%
Show abstract

In an infection, Hepatitis B Virus (HBV) core protein (HBc) normally assembles into icosahedral capsids. Capsid Assembly Modulators (CAMs) are direct acting antivirals that induce HBc mis-assembly and are the subject of active research and development. Two versions of HBc are used in structural studies of CAM-HBc complexes: Cp150 and Cp149-Y132A. Cp150 forms empty icosahedral capsids that are structurally indistinguishable from those found in virions. The Y132A mutation of Cp149 leads to an assembly defective soluble protein that crystalizes as flat hexagonal sheets, where the hexagons resemble icosahedral quasi-sixfold vertices. In this study, we compare structures of CAM-bound Cp150 to CAM-bound Cp149-Y132A. In capsids, the residues forming the CAM site shift to match the structure of bound CAMs, an induced fit. In Cp149-Y132A crystals, CAM sites show little structural adjustment in response to different CAMs binding. In turn, the array of residues that interact with CAMs varies from CAM to CAM in capsid structures but remains nearly constant in Cp149-Y132A crystals. These results illustrate important differences between CAM binding in Cp149-Y132A and Cp150 structures that will contribute to future CAM design.

9
Mutation-Induced Pocket Deactivation: How Ser353/Pro245 Alters KCa2.2 vs KCa3.1 Ligand Selectivity

Gozzi, M.; Massa, J.; Koch, O.

2026-05-06 pharmacology and toxicology 10.64898/2026.05.03.722491 medRxiv
Top 0.2%
3.5%
Show abstract

The KCa2.2 and KCa3.1 channels are fundamental regulator of cellular K+ concentration, and promising target to treat diseases such as spinocerebellar ataxia and cancer. To fully exploit their therapeutic potential, and to continue studying their pathophysiological role, it is crucial to develop selective modulators for each of these two channels. Here we present a computational study to identify the molecular determinants behind the selectivity of two recently reported KCa2.2 modulators. We leveraged a protocol combining in silico mutagenesis, molecular dynamics simulations, and protein-ligand docking to analyse the pockets targeted by these ligands. We identified a Ser353/Pro245 substitution to be the main driver of the distinct pocket shapes in KCa2.2 and KCa3.1 channels, ultimately defining modulator selectivity. This approach provides novel insights into the structural differences of this binding site across potassium channel subtypes, shedding light on the selectivity determinants of modulators targeting this pocket.

10
Specificity Profiling of the RhoGEF Domain of EhFP10 with EhRho GTPases Involved in Cytoskeleton Remodeling

Gautam, A. K.; umarao, P.; Gourinath, S.

2026-05-12 biochemistry 10.64898/2026.05.08.723678 medRxiv
Top 0.2%
3.1%
Show abstract

The Rho family of small GTPases plays a critical role in regulating actin cytoskeleton dynamics during endocytic processes in E. histolytica, including phagocytosis, pinocytosis, and trogocytosis. These proteins act as molecular switches, transitioning between inactive GDP-bound and active GTP-bound states, with guanine nucleotide exchange factors (GEFs) catalyzing this transition. Among the GEFs, EhFP10--a FYVE-domain-containing protein harbouring Dbl homology (DH) and pleckstrin homology (PH) domain was observed in phagocytosis along with seven functionally characterized Rho GTPases (EhRho1, EhRho2, EhRho4, EhRho5, EhRho6, EhRho8, and EhRho13). To study the specificity of FP10, a combination of GEF activity, binding affinity, and molecular dynamics simulations was used to characterize the interactions between EhFP10 and seven Rho GTPases systematically. The results revealed EhRho2 as the most specific and high-affinity interactor of EhFP10, with the highest nucleotide exchange rate and lowest dissociation constant (KD = 0.58 {micro}M). Structural modeling, sequence alignment, and interaction mapping further demonstrated that EhRho2 retains critical contact residues--such as Glu33, Arg4, and Leu69--that are variably absent in other isoforms, correlating with decreased GEF responsiveness. Molecular dynamics simulations and cross-correlation analyses supported the presence of a stable and coordinated interaction interface in the EhFP10-EhRho2 complex, distinguishing it from less active complexes. These findings indicate a highly selective GEF-GTPase module in E. histolytica, analogous to those in higher eukaryotes. The results uncover a potential regulatory mechanism specific to pathogenic amoebae and present EhFP10-EhRho2 as a novel therapeutic target for disrupting cytoskeleton-mediated processes crucial to virulence.

11
Antimicrobial peptide databases and prediction tools: Toward a standard evaluation framework

Cisterna Garcia, A.; Gonzalez Lopez, A. M.; Vozi, A.; Esteban, M. A.; Egli, A.; Jutzeler, C.; Palma, J.; Sanchez-Ferrer, A.; Botia, J. A.

2026-05-21 bioinformatics 10.64898/2026.05.19.726290 medRxiv
Top 0.2%
2.8%
Show abstract

Antimicrobial resistance (AMR) has a profound impact on animal and human health and is associated with substantial morbidity, mortality and public health costs. There is a clear need to develop novel, effective antibiotic agents, which can overcome the current AMR crisis. Antimicrobial peptides (AMPs) may offer such a solution and have attracted growing attention for their potential to combat AMR. In parallel, the growing availability of peptide sequences in public databases has stimulated the development of numerous machine learning and deep learning tools to predict antimicrobial activity computationally. However, it remains unclear how reliably these tools can be compared, as existing studies often rely on heterogeneous datasets and inconsistent evaluation protocols that may lead to data leakage and inflated performance estimates. This raises a central question: what evaluation criteria and benchmark resources are needed to enable fair, reproducible, and biologically meaningful assessment of AMP prediction tools? We address this question by focusing specifically on antibacterial peptides (ABPs). We first provide an overview of AMP databases relevant to antibacterial activity and compare their content, redundancy, and experimental metadata. We then critically assess existing computational tools for ABP prediction, highlighting key limitations related to dataset construction, affinity to certain sequences, data leakage, and inconsistent performance reporting. Based on these limitations, we propose a reference evaluation framework designed to improve comparability, reproducibility, and practical utility in ABP prediction. Finally, we provide targeted recommendations for AMP databases and future tool development to support more robust progress in the computational discovery of ABPs.

12
A Minimal Chemo-mechanical Markov Model for Rotary Catalysis of F1-ATPase

Chen, Y.; Grubmüller, H.

2026-05-18 biophysics 10.1101/2025.06.26.661389 medRxiv
Top 0.3%
2.6%
Show abstract

F1-ATPase, the catalytic domain of ATP synthase, is pivotal for mechanochemical energy conversion in mitochondria. Aiming at a minimal yet quantitative and thermodynamically consistent model for its rotary catalysis mechanism, here we developed a chemo-mechanical Markov model incorporating essential conformational and chemical degrees of freedom. By systematically evaluating over 14,000 model variants via Bayesian inference and cross-validation, we find that a fully functional minimal model requires four functionally distinct {beta}-subunit conformations. Our model reconciles the decade-long bi-site versus tri-site controversy, showing that both pathways contribute depending on ATP concentration. Furthermore, our model suggests a Brownian-ratchet-like mechanism that explains the observation that one ATP hydrolysis event can trigger larger than 120{degrees} rotations, thereby explaining seemingly over 100% efficiency. Beyond this prototypic example of a complex biomolecular machine, our approach should enable one to study other enzymatic mechanisms that implement close coupling between conformational motions, substrate binding, and chemical reactions.

13
Molecular clockwork hypothesis for the KaiABC circadian oscillations

Sasai, M.; Fujishiro, S.

2026-05-12 biophysics 10.64898/2026.05.07.723666 medRxiv
Top 0.3%
2.6%
Show abstract

When three cyanobacterial proteins--KaiA, KaiB, and KaiC--are incubated with ATP in vitro, the phosphorylation level of KaiC exhibits stable circadian oscillations. Biochemical and structural analyses have shown that KaiCs ATPase activity is crucial for these oscillations, leading to the hypothesis that ATP-consuming dynamics function as a molecular clock, determining the oscillation period of individual molecules. Moreover, these molecular clocks synchronize with one another, resulting in collective oscillations at the ensemble level. In this study, we develop a theoretical model to test this molecular clockwork hypothesis. Our model clarifies the relationship between the oscillation period and ATPase activity, explaining the significant changes in the period induced by amino-acid substitutions near the CI-CII domain boundary of the KaiC hexamer. Furthermore, the model addresses the physical basis for temperature compensation concerning both the oscillation period and ATPase activity. Thus, the molecular clockwork perspective provides a framework for understanding the atomic design behind collective oscillations.

14
AI-derived Protein Structures Validation: AlphaFold2 Models in the Twilight Zone

Griffin, P.; Deganutti, G.; Jadeja, K.; Idigbe, C.; Pipito', L.; Mejuto, L.; Ng, C. P.; Peck, S.; Greaves, J.; Reynolds, C. A.

2026-05-12 bioinformatics 10.64898/2026.05.12.724499 medRxiv
Top 0.3%
2.6%
Show abstract

In any field, unquestioningly accepting artificial intelligence (AI) results should be considered bad practise. Here, we devised a comparative modelling-based strategy for validating protein structures that exploits the well-known observation that protein folds are far more conserved than protein sequences. We identify proteins with a similar fold to the AlphaFold-generated query protein and determine their structural alignment to the query. The hypothesis is that if the sequence alignment coincides with the structural alignment, then the structure is validated. The strategy is implemented on a helix-by-helix and strand-by-strand basis using a multi-template pairwise local profile alignment method that works well into the twilight zone. The method is illustrated by application to the transmembrane transporter PEPT1, for which the structure is known, and the S-deacylases ABHD13 and ABHD16A, for which only AI-generated models exist. ABHD16A is particularly challenging because a sequence alignment search with BLASTp does not reveal any structural homologues and therefore requires work with extremely remote homologues; however, both models are validated through this strategy and are stable during classical molecular dynamics simulations. The ability of the strategy to identify errors is assessed with reference to misaligned ABHD13 models and misfolded decoy proteins.

15
An engineered disulfide staple restricts lid loop dynamics and alters substrate specificity of phenylalanine ammonia-lyase

Condruti, R.; Muthuraj, L.; Prakash, J. K.; Littman, S. D.; Kumar R., P.; Nair, N. U.

2026-05-06 bioengineering 10.64898/2026.05.01.722275 medRxiv
Top 0.3%
2.6%
Show abstract

In Anabaena variabilis (Trichormus variabilis) phenylalanine ammonia-lyase (AvPAL), a conserved lid-like loop sits over the active site and has been studied both for its role in positioning a catalytic tyrosine and for its contribution to phenylalanine aminomutase (PAM) activity. While the active site architecture and substrate specificity of AvPAL have been extensively characterized, the dynamic behavior of this unstructured loop beyond its role in catalysis remains poorly understood. Here, we investigate the functional role of this loop by restricting its mobility through targeted interchain disulfide bond engineering. Three in-house approaches were designed to predict ideal cysteine residue pairs: (i) quantifying pair interaction energies via electrostatic and van der Waals forces, (ii) generating a contact map of residues within 5 [A] proximity, and (iii) implementing a machine-learning model trained on datasets from PDBCYS, SPX, and an internal database to rank cysteine pair likelihood within disulfide bond geometric constraints. Our machine-learning-guided strategy yielded a successful variant with complete oxidation efficiency in E. coli. Rigidification of this loop reveals that it also functions as a regulator of substrate specificity. Multi-scale molecular simulation analyses (molecular dynamics, metadynamics, quantum/molecular mechanics) reveal that this modification alters the active-site pocket by reducing the conformational dynamics of substrate binding. Our findings underscore the delicate balance between enzyme flexibility and catalytic efficiency, providing novel insights into the role of this understudied dynamic loop region in AvPAL.

16
Reparameterization of the Amber RNA Force Field Non-Bonded Terms

Puthenpeedikakkal, A. M. K.; Cavender, C. E.; Smith, L. G.; Grossfield, A.; Mathews, D.

2026-05-19 biochemistry 10.64898/2026.05.18.725894 medRxiv
Top 0.3%
2.1%
Show abstract

All-atom simulations of RNA using molecular dynamics have the promise of modeling conformational preferences, folding thermodynamics, conformational change kinetics, and binding affinities of small molecule therapeutics. These simulations rely on a force field, a set of equations and parameters that model the potential energy as a function of conformation using classical mechanics. One popular force field for RNA is Amber OL3, with the most recent iteration derived in 1999 and with subsequent updates to backbone dihedral parameters. The Amber force field, while frequently used, is known to have limitations; for example, it does not properly stabilize native structures against alternative structures. Here, we provide a new approach to fitting the non-bonded parameters for the force field, specifically atom-centered point charges for electrostatics and the Lennard-Jones parameters. The parameters are fit to quantum mechanics (QM) interaction energies calculated with symmetry-adapted perturbation theory (SAPT), including embedded point charges to represent the electrostatic field from solvent and adjacent nucleotides. In this pilot study with a limited set of fitting data, we use the Amber ff99 equations and atom types unchanged. With the revised parameters, we observe improvement in the stability of native structures relative to alternative structures. Native tetraloop conformations, which unfold with the Amber OL3 force field, are stable on the microsecond timescale with our new force field parameters. We also see improvement in the conformational preferences of tetramers. Crucially, A-form helices are still well-modeled, but we observe additional flexibility in an internal loop that is not consistent with NMR data. Overall, we provide evidence that this new approach to fitting RNA force field parameters to SAPT interaction energies with native-structure context represented as embedded point charges is promising. It offers a flexible solution for revising the equations in future work or for extension to other molecules that interact with RNA, such as proteins and small molecules. We call this new set of force field parameters Amber RNA.ROC26.

17
Benchmarking Boltz-2 for Screening of Therapeutic Antibody-Antigen Interactions

Fieux-Castagnet, A.; Waton, J.; Glukhonemykh, A.; Snow, E.; Ashokkumar, R.; Fleming, J.; Champagne, D.; Devenyns, T.; Peluffo, A.; Anagnostopoulos, C.

2026-05-14 bioinformatics 10.64898/2026.05.13.724924 medRxiv
Top 0.4%
1.9%
Show abstract

Protein structure prediction models (such as AlphaFold, Chai, Boltz) have transformed structural biology and are increasingly explored for drug discovery; however, their utility for large-scale screening of antibody-antigen (AB-AG) interactions remains unclear, particularly for distinguishing true binding from non-binding pairs at scale. To our knowledge, there has not been an exhaustive exploration of Boltz-2 inference settings on this high impact problem, and in this paper we set out to describe and implement a novel benchmarking framework that can accelerate progress in the field. We evaluated Boltz-2 (NVIDIA NIM implementation) on 519 therapeutic monoclonal antibodies from Thera-SAbDab, pairing each antibody with its cognate target and a randomly assigned non-cognate antigen. We developed a novel evaluation framework that systematically captures variability across stochastic seeds while benchmarking different inference settings, including datasets with and without crystallographically resolved antibody structures. Across settings, Boltz-2-derived confidence metrics showed weak, though above-chance, discrimination (0.5 < ROC-AUC < 0.60). Among evaluated metrics, the minimum value of the interface predicted TM-score (ipTM-min) across seed-samples, captured the strongest signal. Interestingly, additional feature aggregation and multivariate modelling provided little to no improvement. Increasing the number of stochastic predictions yielded front-loaded gains, with diminishing returns beyond [~]15-20 seed-samples, suggesting limited value of extensive sampling in practical workflows. Notably, inference without multiple sequence alignments (MSAs) slightly improved performance on non-crystallized antibodies ({Delta}AUROC {approx} +0.027) while reducing runtime by [~]8 seconds per prediction compared to shallow MSA settings. Overall, these results indicate that off-the-shelf confidence metrics from general-purpose structure prediction models may be insufficient for reliable target-antibody screening and highlight the need for task-specific optimization, while confirming that modest amounts of sampling can be helpful, but not in itself sufficient to improve performance significantly as gains plateau relatively quickly.

18
Drug design using unique conformations to preferentially target a specific site on collagen-bound MMP1

SARKAR, S. K.; Nash, A.; Harms, C.

2026-05-17 biophysics 10.64898/2026.05.14.725194 medRxiv
Top 0.4%
1.9%
Show abstract

Precise site-specific drug design remains a challenge in structure-based drug discovery. Most existing approaches screen for ligands to target binding pockets on a protein surface based on static structures obtained from techniques such as X-ray, NMR, cryo-EM, and AlphaFold. However, the structure-function paradigm is, in reality, a structure-dynamics-function relationship that determines a proteins binding and activity. As such, drug screening or design without evaluating binding competition across the protein surface or considering the receptors dynamic, substrate-dependent conformational states is incomplete. Substrate-specific unique protein conformations are underexplored and offer novel opportunities for selective therapeutic targeting, though systematic workflows for identifying and exploiting such sites remain limited. Previously, we showed that collagen alters matrix metalloprotease-1 (MMP1) dynamics and that R405 is an allosteric residue on the MMP1 surface that exhibits strong dynamic correlations with its active site. Here, we present a substrate-specific allosteric drug-design framework that targets specific sites on a protein, using collagen-bound MMP1 as a model system. We determined the conformational dynamics of free and collagen-bound MMP1 using all-atom molecular dynamics (MD) simulations and categorized conformations into clusters of similar conformations. We then compared and identified unique conformations that occur only in collagen-bound MMP1 to design drugs against them using a machine-learning approach. The top three unique clusters were used to generate approximately 150,000 candidate compounds that were then screened against both the R405-centered region and all detectable binding pockets across the MMP1 surface. We have found several compounds that bind preferentially around R405 by at least 0.3 kcal/mol relative to competing sites across the surface. This strategy establishes a generalizable framework for designing ligands that preferentially target substrate-specific allosteric sites, providing new opportunities for precision therapeutics that modulate proteins in their biologically relevant functional states. Simple SummaryIn this paper, we establish a substrate-specific allosteric drug-design strategy that integrates all-atom molecular dynamics simulations, conformational clustering, machine-learning-based ligand design, and surface-wide binding-selectivity screening, using collagen-bound MMP1 as a model system. We show that collagen binding reshapes the conformational ensemble of MMP1, creating unique conformational states that are absent or inaccessible in the free enzyme. By identifying these substrate-specific conformations, generating ligands based on the corresponding dynamic fingerprints around the collagen-specific allosteric residue R405, and screening compounds across all binding pockets on the MMP1 surface, we demonstrate preferential targeting of the collagen-specific site relative to competing pockets. These results establish a generalizable framework for designing ligands that selectively recognize biologically relevant substrate-bound conformations rather than static protein structures alone. Substrate-specific allosteric targeting may enable selective modulation of individual protein functions while minimizing off-target interactions, providing new opportunities for precision therapeutics against dynamic protein systems.

19
Redesign selective protein binders using contrastive decoding

Xie, Z.; Xu, J.

2026-05-13 bioinformatics 10.64898/2026.05.09.722041 medRxiv
Top 0.4%
1.8%
Show abstract

MotivationFixed-backbone sequence design methods such as ProteinMPNN operate on backbone coordinates alone and cannot represent target side-chains at the binding interface. Their decoding algorithm also lacks a mechanism to balance binding affinity and folding stability or to improve selectivity against structurally similar off-targets. These gaps limit the computational design of protein binders with high affinity and specificity. ResultsWe present RedNet, a multiscale graph neural network that encodes side-chain information of the binding target. We further develop a contrastive decoding algorithm, motivated by the thermodynamic decomposition of binding free energy, that addresses two objectives: (1) balancing binding affinity and folding stability, and (2) improving selectivity against structurally similar off-targets. RedNet reaches 43% native sequence recovery on heterodimers, compared with 37% for ProteinMPNN and 33% for ESM-IF. With contrastive decoding, it matches native-sequence co-folding success (68%) on high-confidence AlphaFold3 targets, exceeding ProteinMPNN (59%) and ESM-IF (61%). On a new benchmark of structurally similar on-/off-target pairs, RedNet with contrastive decoding reaches 64.8% energetic selectivity, ahead of PiFold (55.6%), ProteinMPNN (53.7%), and ESM-IF (53.7%). AvailabilitySource code and datasets are released at https://github.com/zw2x/rednet_public. Contactjinbo.xu@gmail.com

20
Deep Learning Structural Ensembles as Proxies for Protein Flexibility

Tunc, M. T.; Dizkirici Tekpinar, A.; Tekpinar, M.

2026-05-18 bioinformatics 10.64898/2026.05.16.725658 medRxiv
Top 0.4%
1.7%
Show abstract

Protein dynamics are essential to biological function, yet understanding whether deep learning models contain information about these dynamics remains an open question. In this study, we quantitatively investigate the capacity of deep learning structure generation methods to predict protein flexibilities by directly comparing residue-level mean squared fluctuation (MSF) profiles derived from structural ensembles with experimental or simulation-informed flexibility profiles. We assembled four diverse benchmark datasets representing different types of structural information, including 70 NMR ensembles, 43 X-ray crystallographic protein pairs in two distinct conformational states, 82 high-resolution cryo-EM structures, and molecular dynamics simulations of 10 proteins. Utilizing AlphaFold3, AlphaFold2, and RosettaFold to generate multiple structural models, we applied ranksort normalization to place the profiles on a comparable scale and quantified similarity primarily using cosine and Pearson similarities. Our results demonstrate that the flexibility predictions from deep learning-generated models agree well with experimental data, suggesting that fluctuations in these predicted ensembles can serve as effective proxies for protein flexibility. Notably, AlphaFold3 consistently produced the best results across the datasets. We also observed that flexibility prediction accuracy generally improves as the number of models increases up to 15, and our findings remain robust even when terminal residues are excluded from the analysis. To facilitate broader application, we provide three publicly accessible Jupyter Notebooks to calculate MSF from deep learning outputs. Ultimately, this work provides evidence that deep learning structural ensembles can serve as proxies for protein flexibility.